神经网络的鲁棒性和异常检测能力是其在现实世界中安全采用的关键主题。此外,最近网络的过度参数伴随着高计算成本,并提出了有关其对稳健性和异常检测的影响的疑问。在这项工作中,我们表明稀疏性可以使网络更强大,更好的异常检测器。为了进一步激励这一点,我们表明,预先训练的神经网络包含在其参数空间内,稀疏的子网络在没有任何进一步培训的情况下在这些任务上更好。我们还表明,结构化的稀疏性极大地有助于降低昂贵的鲁棒性和检测方法的复杂性,同时维持甚至改善其在这些任务上的结果。最后,我们引入了一种新方法Sensnorm,该方法使用从适当的修剪方法得出的权重的灵敏度来检测输入中的异常样品。
translated by 谷歌翻译
修剪是稀疏深神经网络的任务,最近受到了越来越多的关注。尽管最先进的修剪方法提取了高度稀疏的模型,但它们忽略了两个主要挑战:(1)寻找这些稀疏模型的过程通常非常昂贵; (2)非结构化的修剪在GPU记忆,训练时间或碳排放方面没有提供好处。我们提出了通过梯度流量保存(早期CROP)提出的早期压缩,该压缩在训练挑战(1)的培训(1)中有效提取最先进的稀疏模型,并且可以以结构化的方式应用来应对挑战(2)。这使我们能够在商品GPU上训练稀疏的网络,该商品GPU的密集版本太大,从而节省了成本并减少了硬件要求。我们从经验上表明,早期杂交的表现优于许多任务(包括分类,回归)和域(包括计算机视觉,自然语言处理和增强学习)的丰富基线。早期杂交导致准确性与密集训练相当,同时超过修剪基线。
translated by 谷歌翻译
鉴于他们的普及和应用程序的多样性,图形神经网络(GNNS)越来越重要。然而,对对抗性袭击的脆弱性的现有研究依赖于相对较小的图形。我们解决了这个差距并研究了如何在规模攻击和捍卫GNN。我们提出了两个稀疏感知的一阶优化攻击,尽管优化了在节点数量中的许多参数上优化了有效的表示。我们表明,普通的替代损失并不适合全球对GNN的攻击。我们的替代品可以加倍攻击力量。此外,为了提高GNNS的可靠性,我们设计了强大的聚合函数,软中位,导致所有尺度的有效防御。我们评估了我们的攻击和防御与图形的标准GNN,与以前的工作相比大于100倍以上。我们甚至通过将技术扩展到可伸缩的GNN来进一步缩放一个数量级。
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
Tobacco origin identification is significantly important in tobacco industry. Modeling analysis for sensor data with near infrared spectroscopy has become a popular method for rapid detection of internal features. However, for sensor data analysis using traditional artificial neural network or deep network models, the training process is extremely time-consuming. In this paper, a novel broad learning system with Takagi-Sugeno (TS) fuzzy subsystem is proposed for rapid identification of tobacco origin. Incremental learning is employed in the proposed method, which obtains the weight matrix of the network after a very small amount of computation, resulting in much shorter training time for the model, with only about 3 seconds for the extra step training. The experimental results show that the TS fuzzy subsystem can extract features from the near infrared data and effectively improve the recognition performance. The proposed method can achieve the highest prediction accuracy (95.59 %) in comparison to the traditional classification algorithms, artificial neural network, and deep convolutional neural network, and has a great advantage in the training time with only about 128 seconds.
translated by 谷歌翻译
Accurate modeling of ship performance is crucial for the shipping industry to optimize fuel consumption and subsequently reduce emissions. However, predicting the speed-power relation in real-world conditions remains a challenge. In this study, we used in-service monitoring data from multiple vessels with different hull shapes to compare the accuracy of data-driven machine learning (ML) algorithms to traditional methods for assessing ship performance. Our analysis consists of two main parts: (1) a comparison of sea trial curves with calm-water curves fitted on operational data, and (2) a benchmark of multiple added wave resistance theories with an ML-based approach. Our results showed that a simple neural network outperformed established semi-empirical formulas following first principles. The neural network only required operational data as input, while the traditional methods required extensive ship particulars that are often unavailable. These findings suggest that data-driven algorithms may be more effective for predicting ship performance in practical applications.
translated by 谷歌翻译
As a common appearance defect of concrete bridges, cracks are important indices for bridge structure health assessment. Although there has been much research on crack identification, research on the evolution mechanism of bridge cracks is still far from practical applications. In this paper, the state-of-the-art research on intelligent theories and methodologies for intelligent feature extraction, data fusion and crack detection based on data-driven approaches is comprehensively reviewed. The research is discussed from three aspects: the feature extraction level of the multimodal parameters of bridge cracks, the description level and the diagnosis level of the bridge crack damage states. We focus on previous research concerning the quantitative characterization problems of multimodal parameters of bridge cracks and their implementation in crack identification, while highlighting some of their major drawbacks. In addition, the current challenges and potential future research directions are discussed.
translated by 谷歌翻译
Two approaches to AI, neural networks and symbolic systems, have been proven very successful for an array of AI problems. However, neither has been able to achieve the general reasoning ability required for human-like intelligence. It has been argued that this is due to inherent weaknesses in each approach. Luckily, these weaknesses appear to be complementary, with symbolic systems being adept at the kinds of things neural networks have trouble with and vice-versa. The field of neural-symbolic AI attempts to exploit this asymmetry by combining neural networks and symbolic AI into integrated systems. Often this has been done by encoding symbolic knowledge into neural networks. Unfortunately, although many different methods for this have been proposed, there is no common definition of an encoding to compare them. We seek to rectify this problem by introducing a semantic framework for neural-symbolic AI, which is then shown to be general enough to account for a large family of neural-symbolic systems. We provide a number of examples and proofs of the application of the framework to the neural encoding of various forms of knowledge representation and neural network. These, at first sight disparate approaches, are all shown to fall within the framework's formal definition of what we call semantic encoding for neural-symbolic AI.
translated by 谷歌翻译
Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).
translated by 谷歌翻译
Market sentiment analysis on social media content requires knowledge of both financial markets and social media jargon, which makes it a challenging task for human raters. The resulting lack of high-quality labeled data stands in the way of conventional supervised learning methods. Instead, we approach this problem using semi-supervised learning with a large language model (LLM). Our pipeline generates weak financial sentiment labels for Reddit posts with an LLM and then uses that data to train a small model that can be served in production. We find that prompting the LLM to produce Chain-of-Thought summaries and forcing it through several reasoning paths helps generate more stable and accurate labels, while using a regression loss further improves distillation quality. With only a handful of prompts, the final model performs on par with existing supervised models. Though production applications of our model are limited by ethical considerations, the model's competitive performance points to the great potential of using LLMs for tasks that otherwise require skill-intensive annotation.
translated by 谷歌翻译